Women's Global Labor Market

Introduction¶

1111.png

Women in the World Labor Force

Women are severely underrepresented in the global labor market: around 50% of women work or actively seek work for income, compared to 80% for men. If women are treated unequally in the labor market, this is not only an equity concern, but also a matter of economic efficiency.

The United Nations has declared 17 primary goals for the organization, a few of which are principally related to this very issue:

  • Goal 5: Gender Equality
  • Goal 8: Decent Work and Economic Growth
  • Goal 10: Reduce Inequalities (Between Countries)

We are Equality Data Investigative Unit, a branch of the UN's gender-focused initiatives. We have been directed to investigate this large gap in the global labor market to determine the background behind it, which countries suffer from it worse, and what are the possible root causes.

Claudia Golden, winner of the 2023 Nobel Prize in Economic Sciences. Golden's work significantly advanced our understanding of gender disparities in the labor market.

A primary focus of her paper was her “U-Curve”, showing women’s labor participation in relation to time (thus, industrial development) in the United States. Building upon her influential findings, our project aims to further explore and understand these gender disparities in the workforce, shedding light on the various factors and challenges that contribute to these differences.

- What are the trends in the gender gap in labor participation rates around the world?

Why? Answering this question will give us a basepoint from which we can guide the rest of our analysis.

- Why are labor market gender gaps so pervasive around the world today?

Why? The common understanding was that with economic growth and development of educational accessibility, the gender labor gap would lessen. We must find out if this is true, to validate our fundamental understanding of this economic principle. If this is not the case, the efforts that follow will guide global action in the correct direction.

- What explains the variation in the size of these gaps over time and across continents?

Across the world we have essentially an infinite number of unique cultures, communities, and values, all resulting in completely opposing factors being at play, even for similar results. This question will allow us to analyze these issues with more locality than at a global scale, giving more accurate analysis.

IN SUMMARY:

Answering these questions is of fundamental importance for prosperity. The allocation of labor is inefficient if workers are not assigned to the jobs best suited to their skills, let alone, working at all. Such inefficiencies lead to large economic deadweight loss on a global scale. Reducing the gender gap in employment and improving the allocation of female talent could thus lead to significant increases in global GDP.

Dataset Description:

The dataset employed in this study is sourced from the World Bank Gender Data Portal, available for download as a CSV file. The original dataset encompasses 305,545 rows and 57 columns, featuring an aggregate of 1153 distinct quantitative and qualitative variables across 265 countries and regions. For analysis purposes, the data has been chronologically arranged from 1991 to 2021 and countries have been categorized by continents. When converted to long format, this dataset consists of over 38 Million rows of categorical numerical data points.

Disclaimer on Gender Representation:

Our research follows Claudia Golden's framework and the World Bank portal's data, both of which use a binary gender definition (female and male). While our study focuses on this binary context, we acknowledge and respect the broader spectrum of gender identities and the importance of gender equality. Our analysis is limited to the available data and does not overlook the value of a more inclusive approach to gender.

Choice for Heavier Grading on Data Processing or Data Analysis¶

In this project, we advocate for a stronger emphasis on data analysis in our grading criteria. Our approach centers on extracting significant insights from a comprehensive and intricate dataset, tailored for an extensive, longitudinal global analysis. Far surpassing mere data processing, our investigation delves into identifying the root causes and broader implications of gender gaps in labor participation. The sophistication of our analysis lies in its capacity to reveal not just patterns but also to provide meaningful, actionable insights, thereby underscoring the elevated caliber and depth of our data analytical methodology.

Initializing¶

Install libraries¶

In [1]:
#MOUNT DRIVE IN COLAB ENVIRONMENT
#drive.mount('/content/drive')
In [2]:
# INSTALL 'pycountry'
!pip install pycountry

# INSTALL 'pycountry-convert'
!pip install pycountry-convert

!pip install geopy
!pip install plotly --upgrade
Requirement already satisfied: pycountry in c:\users\shami\anaconda3\lib\site-packages (22.3.5)
Requirement already satisfied: setuptools in c:\users\shami\anaconda3\lib\site-packages (from pycountry) (68.0.0)
Requirement already satisfied: pycountry-convert in c:\users\shami\anaconda3\lib\site-packages (0.7.2)
Requirement already satisfied: pprintpp>=0.3.0 in c:\users\shami\anaconda3\lib\site-packages (from pycountry-convert) (0.4.0)
Requirement already satisfied: pycountry>=16.11.27.1 in c:\users\shami\anaconda3\lib\site-packages (from pycountry-convert) (22.3.5)
Requirement already satisfied: pytest>=3.4.0 in c:\users\shami\anaconda3\lib\site-packages (from pycountry-convert) (7.4.0)
Requirement already satisfied: pytest-mock>=1.6.3 in c:\users\shami\anaconda3\lib\site-packages (from pycountry-convert) (3.12.0)
Requirement already satisfied: pytest-cov>=2.5.1 in c:\users\shami\anaconda3\lib\site-packages (from pycountry-convert) (4.1.0)
Requirement already satisfied: repoze.lru>=0.7 in c:\users\shami\anaconda3\lib\site-packages (from pycountry-convert) (0.7)
Requirement already satisfied: wheel>=0.30.0 in c:\users\shami\anaconda3\lib\site-packages (from pycountry-convert) (0.38.4)
Requirement already satisfied: setuptools in c:\users\shami\anaconda3\lib\site-packages (from pycountry>=16.11.27.1->pycountry-convert) (68.0.0)
Requirement already satisfied: iniconfig in c:\users\shami\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry-convert) (1.1.1)
Requirement already satisfied: packaging in c:\users\shami\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry-convert) (23.0)
Requirement already satisfied: pluggy<2.0,>=0.12 in c:\users\shami\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry-convert) (1.0.0)
Requirement already satisfied: colorama in c:\users\shami\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry-convert) (0.4.6)
Requirement already satisfied: coverage[toml]>=5.2.1 in c:\users\shami\anaconda3\lib\site-packages (from pytest-cov>=2.5.1->pycountry-convert) (7.3.2)
Requirement already satisfied: geopy in c:\users\shami\anaconda3\lib\site-packages (2.4.1)
Requirement already satisfied: geographiclib<3,>=1.52 in c:\users\shami\anaconda3\lib\site-packages (from geopy) (2.0)
Requirement already satisfied: plotly in c:\users\shami\anaconda3\lib\site-packages (5.18.0)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\shami\anaconda3\lib\site-packages (from plotly) (8.2.2)
Requirement already satisfied: packaging in c:\users\shami\anaconda3\lib\site-packages (from plotly) (23.0)

Import necessary libraries and modules¶

The selection and importation of these libraries are driven by their functionality and relevance to our project's goals.

  1. Importing Pandas and NumPy: The pandas library is a cornerstone in data manipulation and analysis in Python, offering powerful data structures like DataFrames that make data cleaning, exploration, and transformation more efficient. numpy is another fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays.

  2. Importing Pycountry and Pycountry_convert: pycountry provides access to the ISO country, subdivision, language, script, and currency codes. Meanwhile, pycountry_convert is a library for converting country names between different country code standards. These libraries are particularly useful in handling and standardizing country data, ensuring consistency and ease of mapping between different datasets that may use varied country naming conventions.

  3. Importing Matplotlib and Plotly: matplotlib.pyplot is a plotting library that allows for the creation of static, animated, and interactive visualizations in Python. plotly.express is another visualization library that enables the creation of complex, interactive, and aesthetically pleasing plots. Both libraries are essential for data visualization, which is a critical part of data analysis, enabling the visual exploration of trends, patterns, and outliers.

  4. Importing SciPy's Curve_fit: The curve_fit function from the scipy.optimize module is an important tool for fitting a function to a set of data. It can be used to model relationships between variables and estimate parameters for predictive analysis.

  5. Importing Seaborn: seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. This library is particularly useful for creating more complex and aesthetically pleasing plots compared to matplotlib, especially for statistical data.

  6. Importing Geopy's Nominatim: The Nominatim class from the geopy.geocoders module is used for geocoding, which is the process of converting addresses into geographic coordinates. This can be particularly useful for projects that involve geographic data and require mapping or spatial analysis.

Each of these libraries is selected to fulfill specific roles in the project, from data manipulation and cleaning (Pandas, NumPy) to data visualization (Matplotlib, Seaborn, Plotly) and special functions like country code conversion and geocoding (Pycountry, Pycountry_convert, Geopy). Their inclusion at the beginning of the project sets the foundation for a wide range of data analysis and visualization tasks that will be performed in subsequent stages.

In [3]:
# IMPORT NECESSARY LIBRARIES
import pandas as pd
import numpy as np
import pycountry
import pycountry_convert as pc
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px
from scipy.optimize import curve_fit
import seaborn as sns
from geopy.geocoders import Nominatim

Dataset explanation¶

  • This dataset, sourced from the Gender_StatsData.csv file, presents a comprehensive collection of data points that focus on various gender-related indicators across different countries and regions. Structured as a DataFrame using the pandas library in Python, it offers a nuanced view into the gender dynamics at play globally.

  • The dataset is organized into several columns, beginning with 'Country Name' and 'Country Code', which provide the geographical context for the data. These fields identify the specific country or region to which the data corresponds, ensuring that users can easily segregate and analyze the data geographically.

  • Central to this dataset are the 'Indicator Name' and 'Indicator Code' columns. These fields detail the specific gender-related metrics being measured, ranging from legal rights and societal norms to employment and education statistics. The indicators cover a broad spectrum of gender issues, reflecting the diverse challenges and situations encountered in different parts of the world.

  • The dataset spans a historical range from 1960 to 2022, with each year represented as a separate column. This temporal breadth allows for an analysis of trends and changes in gender dynamics over time, providing insights into how gender equality and women's rights have evolved.

  • However, a notable aspect of this dataset is the prevalence of missing values (NaNs) in the early years, reflecting the lack of consistent data collection in the past. Users of this dataset must be mindful of these gaps and consider appropriate methods for handling missing data in their analysis.

  • Overall, this dataset serves as a rich source for examining gender disparities and progression. By offering detailed, country-specific data across a multitude of indicators and over an extensive time period, it enables a deep and nuanced analysis of gender issues on a global scale.

In [4]:
# IMPORT CSV TO PANDAS DATAFRAME
gender_df = pd.read_csv(r'Gender_StatsData.csv')

# DISPLAY TOP 10 ROWS IN DATAFRAME
gender_df.head(5)
Out[4]:
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
0 Africa Eastern and Southern AFE A woman can apply for a passport in the same w... SG.APL.PSPT.EQ NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Africa Eastern and Southern AFE A woman can be head of household in the same w... SG.HLD.HEAD.EQ NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Africa Eastern and Southern AFE A woman can choose where to live in the same w... SG.LOC.LIVE.EQ NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Africa Eastern and Southern AFE A woman can get a job in the same way as a man... SG.GET.JOBS.EQ NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Africa Eastern and Southern AFE A woman can obtain a judgment of divorce in th... SG.OBT.DVRC.EQ NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 67 columns


DATA PROCESSING¶

Data Transformation: Melting Wide Data to Long Format in the gender_df DataFrame¶

In the next step of the data analysis, the original gender_df DataFrame, which is structured in a wide format, undergoes a transformation into a long format. This restructuring is achieved using the melt function from the pandas library. The primary rationale behind this transformation is to facilitate easier manipulation and analysis of the dataset, especially when dealing with time-series data or performing operations that require a more normalized data structure.

  1. Melting the DataFrame: The melt function is applied to gender_df, with the id_vars parameter set to include 'Country Name', 'Country Code', 'Indicator Name', and 'Indicator Code'. These columns are selected as identifiers because they provide the essential context for each data point, representing the geographical and thematic specifics of the indicators. All other columns in the DataFrame, which primarily consist of years (1960 to 2022), are treated as value variables. As a result of this melting process, the DataFrame transforms from a wide format, where each year has its own column, into a long format, where each row corresponds to a single year-indicator-country combination. This long format is generally more suitable for various data analysis tasks, as it aligns all value data into a single column, making it easier to filter, sort, and aggregate.

  2. Renaming the 'variable' Column: After the melting process, the new column that contains the years is initially named 'variable'. This name is not very descriptive, so it is renamed to 'Year' for better clarity. Renaming this column enhances the readability and understandability of the DataFrame, as 'Year' directly indicates the nature of the data within the column, which is essential for any subsequent temporal analysis.

This transformation step is crucial in preparing the dataset for more detailed analysis, especially when the goal is to examine trends over time or conduct comparisons across different countries and indicators. By converting the dataset into a long format, the analysis can now easily focus on specific years, indicators, or countries, paving the way for more precise and insightful data exploration.

In [5]:
# MELT DATAFRAME INTO LONG FORMAT FROM WIDE FORMAT TO MAKE DATA MORE ACCESSIBLE
gender_df = gender_df.melt(id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'])

# RENAME 'Variable' TO YEAR FOR CLARITY
gender_df.rename(columns={"variable":"Year"}, inplace=True)

gender_df
Out[5]:
Country Name Country Code Indicator Name Indicator Code Year value
0 Africa Eastern and Southern AFE A woman can apply for a passport in the same w... SG.APL.PSPT.EQ 1960 NaN
1 Africa Eastern and Southern AFE A woman can be head of household in the same w... SG.HLD.HEAD.EQ 1960 NaN
2 Africa Eastern and Southern AFE A woman can choose where to live in the same w... SG.LOC.LIVE.EQ 1960 NaN
3 Africa Eastern and Southern AFE A woman can get a job in the same way as a man... SG.GET.JOBS.EQ 1960 NaN
4 Africa Eastern and Southern AFE A woman can obtain a judgment of divorce in th... SG.OBT.DVRC.EQ 1960 NaN
... ... ... ... ... ... ...
19249330 Zimbabwe ZWE Worried about not having enough money for old ... fin44a1.d.2 2022 NaN
19249331 Zimbabwe ZWE Youth illiterate population, 15-24 years, % fe... UIS.LPP.AG15T24 2022 NaN
19249332 Zimbabwe ZWE Youth illiterate population, 15-24 years, both... UIS.LP.AG15T24 2022 NaN
19249333 Zimbabwe ZWE Youth illiterate population, 15-24 years, fema... UIS.LP.AG15T24.F 2022 NaN
19249334 Zimbabwe ZWE Youth illiterate population, 15-24 years, male... UIS.LP.AG15T24.M 2022 NaN

19249335 rows × 6 columns

Categorizing Countries by Continent in the 'gender_df' DataFrame¶

The below code snippets details the process of categorizing countries and regions within the 'employment_time_data_indicator1' DataFrame based on their respective continents. The code performs the following steps:

  1. Defining Country Lists: The code begins by defining several lists:

    • all_countries stores unique country names within the dataset.
    • asia, europe, africa, and south_america list specific countries or regions belonging to these continents.
  2. Categorization Function: The make_continent function is defined to categorize countries into continents based on a set of conditions:

    • If the country name contains specific keywords from the 'logic' list or is found in the 'non_country' list, it is categorized as 'ZZ_REGION,' indicating it's not a specific country.
    • Countries in Asia, Europe, South America, and Africa are categorized accordingly.
    • For all other countries, the code splits the country name to handle cases where there are subdivisions (e.g., "Country, Subdivision"). It then utilizes the 'pycountry' library to determine the country's continent based on the alpha-2 country code.
  3. Applying the Function: The 'make_continent' function is applied to the all_countries list to create a dictionary which stores all country names as keys and their corresponding continent as values. This dictionary is then mapped to create the continent column in our main gender_df dataframe.

This code aids in analysis by providing a 'Continent' column, allowing to explore and examine the data in the context of continents. With the amount of countries in the dataset, it is easier to analyze trends with 6 continents (not 7 because Antarctica is not considered) in mind as opposed to 195 countries.

In [6]:
# FUNCTION THAT RETURNS COUNTINENT FROM INPUTTED COUNTRY
def make_continent(country):
    if country in asia:
      return('Asia')
    elif country in europe:
      return('Europe')
    elif country in south_america:
      return('South America')
    elif country in africa:
      return('Africa')
    else:
      try:
        # SPLIT COUNTRY NAME TO HANDLE SPECIAL CASES "Country, Subdivision".
        country = country.split(',')[0]
        country_alpha2 = pc.country_name_to_country_alpha2(country)
        country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
        country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
        return country_continent_name
      except:
        country_continent_name = 'ZZ_NONCOUNTRY'
        return country_continent_name
In [7]:
# LIST OF ALL COUNTRIES IN DATASET
all_countries = gender_df['Country Name'].unique()

# SPECIFICALLY LISTING COUNTRIES THAT CAUSE ISSUES WITH LOGIC
asia = ["Hong Kong SAR, China",'Korea, Rep.','Lao PDR','Macao SAR, China','Timor-Leste','West Bank and Gaza','Turkiye']
europe = ['Kosovo']
africa = ["Cote d'Ivoire"]
south_america = ['Curacao']

# CREATE DICTIONARY WHICH MAPS
country_continent_key = {}
for each in all_countries:
  country_continent_key[each] = make_continent(each)

This below line of code adds a new column named 'Continent' to the gender_df DataFrame. It does this by applying the country_continent_key dictionary to each entry in the 'Country Name' column. The function's output, which is the corresponding continent for each country, is then used to populate the new 'Continent' column. This operation is efficiently handled using the map method, which takes the country_continent_key dictionary as its argument. As a result, the gender_df DataFrame is enriched with continent information for each row, facilitating continent-based analysis in subsequent steps.

In [8]:
# MAP COUNTRY CONTINENT KEY TO CREATE CONTINENT COLUMN
gender_df['Continent'] = gender_df['Country Name'].map(country_continent_key)

The below segment of code narrows the focus of the gender_df DataFrame to include only data from specified continents. Initially, it defines a list, desired_continents, containing six continents: Africa, Asia, Europe, North America, Oceania, and South America. The subsequent line filters gender_df, retaining rows where the 'Continent' column's value matches one of the continents in the desired_continents list. This is achieved using the isin method, which checks for matching continent names. The .copy() method at the end ensures the creation of an independent copy of the filtered DataFrame, avoiding potential issues related to modifying DataFrame slices. The result is a streamlined DataFrame focused solely on the regions of interest, setting the stage for more targeted and relevant analyses.

In [9]:
# LIST OF DESIRED CONTINENTS
desired_continents = ['Africa', 'Asia', 'Europe','North America','Oceania','South America']

# USING LIST, FILTER OUT ROWS NOT CONTAINED
gender_df = gender_df[gender_df['Continent'].isin(desired_continents)].copy()

gender_df.head()
Out[9]:
Country Name Country Code Indicator Name Indicator Code Year value Continent
55344 Afghanistan AFG A woman can apply for a passport in the same w... SG.APL.PSPT.EQ 1960 NaN Asia
55345 Afghanistan AFG A woman can be head of household in the same w... SG.HLD.HEAD.EQ 1960 NaN Asia
55346 Afghanistan AFG A woman can choose where to live in the same w... SG.LOC.LIVE.EQ 1960 NaN Asia
55347 Afghanistan AFG A woman can get a job in the same way as a man... SG.GET.JOBS.EQ 1960 NaN Asia
55348 Afghanistan AFG A woman can obtain a judgment of divorce in th... SG.OBT.DVRC.EQ 1960 NaN Asia

Data Refinement & Cleanup : Preparing dataFrame for analysis¶

The below code continues to perform various operations on the gender_df DataFrame. Here's an explanation of each step:

  1. Set a Multi-Index: The code sets a multi-index on the DataFrame using the .set_index method. It specifies that the index levels should be 'Country Name' and 'Year'. This operation organizes the data so that each row is uniquely identified by a combination of country name and year.

  2. Sort the DataFrame: After setting the multi-index, the code sorts the DataFrame based on the multi-index levels. It first sorts by 'Country Name' and then by 'Year'. Sorting the data in this way can be useful for data visualization and analysis tasks, ensuring that the data is organized in a meaningful order.

  3. Delete 'Country Code' Column: The code deletes the 'Country Code' column from the DataFrame using the del statement. Since 'Country Code' was likely used for identification and is no longer needed for analysis, removing it can help reduce unnecessary data and simplify the DataFrame.

  4. Remove Rows with Missing Values (NaN): The code uses the .dropna method to remove rows in the DataFrame that contain missing values (NaN). This step is essential for data quality, as it ensures that only rows with complete data are retained for analysis.

In [10]:
# CREATE HEIRARCHICAL INDEX WITH COUNTRY NAME AND YEAR AS LEVEL 0 & 1 INDEX
gender_df.set_index(["Country Name","Year"], inplace=True)

# SORT DF BY INDEX
gender_df.sort_index(level=['Country Name','Year'], inplace=True)

# DELETE COUNTRY CODE COLUMN (NOT NEEDED)
del gender_df['Country Code']

# REMOVE ROWS CONTAINING NaN (WOULD INTERRUPT ANALYSIS)
gender_df.dropna(inplace=True)

The result of running the above code will be a modified version of the gender_df DataFrame with the following characteristics:

  • It has a multi-index with 'Country Name' and 'Year' as the index levels.
  • The DataFrame is sorted first by 'Country Name' and then by 'Year.'
  • The 'Country Code' column is deleted from the DataFrame.
  • Rows with missing values (NaN) are removed from the DataFrame.

This modified DataFrame is now in a structured and clean format, ready for further data analysis, visualization, and other tasks.

In [11]:
gender_df.head()
Out[11]:
Indicator Name Indicator Code value Continent
Country Name Year
Afghanistan 1960 Adolescent fertility rate (births per 1,000 wo... SP.ADO.TFRT 138.876000 Asia
1960 Age dependency ratio (% of working-age populat... SP.POP.DPND 80.051114 Asia
1960 Age population, age 0, female, interpolated SP.POP.AG00.FE.IN 178344.500000 Asia
1960 Age population, age 0, male, interpolated SP.POP.AG00.MA.IN 182281.000000 Asia
1960 Age population, age 01, female, interpolated SP.POP.AG01.FE.IN 151954.500000 Asia

Data Analysis and Visualizations¶

Indicator retriever function¶

The retrieve_indicators function efficiently extracts specific data from the gender_df DataFrame based on a list of indicator codes. The process involves:

  1. Filtering Data: It first filters gender_df to include only rows that match the provided indicator codes.

  2. Reindexing: The filtered DataFrame is then reindexed with 'Continent', 'Country Name', and 'Year', aligning it for better geographical and temporal analysis.

  3. Sorting: The data is sorted according to the new multi-index, ensuring an organized structure.

  4. Returning the Result: The function returns this curated DataFrame, now primed for focused and efficient analysis of the specified indicators.

In [12]:
# FUNCTION WHICH RETURNS DF CONTAINING INDICATORS SPECIFIED IN A LIST
def retrieve_indicators(indicators):
      # FILTER AND RETRIEVE INDICATORS
      retrieved_df = gender_df[gender_df['Indicator Code'].isin(indicators)].reset_index()
      # SET MULTI INDEX
      retrieved_df = retrieved_df.set_index(['Continent', 'Country Name', 'Year'])
      # SORT BY INDEX
      retrieved_df = retrieved_df.sort_index(level=['Continent', 'Country Name', 'Year'])
      return retrieved_df

Categorizing indicators¶

In the below code, specific indicators related to education, employment, birth rates, and unemployment are first categorized into separate lists. These indicators are chosen to explore the interrelationships between different aspects of societal and economic development. The indicators are then combined into a single comprehensive list for a more holistic analysis.

In [13]:
# DEFINE RELEVANT INDICATORS TO BE USED
education_indicators = ['SE.PRM.ENRR', 'SE.PRM.NENR', 'SE.PRM.ENRR.FE', 'SE.PRM.NENR.FE',
                        'SE.SEC.ENRR', 'SE.SEC.NENR', 'SE.SEC.ENRR.FE', 'SE.SEC.NENR.FE',
                        'SE.ADT.1524.LT.FM.ZS', 'SE.ENR.PRIM.FM.ZS', 'SE.ENR.SECO.FM.ZS',
                        'SE.ENR.PRSC.FM.ZS']
employment_indicators = ['SL.TLF.CACT.FE.ZS', 'SL.TLF.BASC.FE.ZS', 'SL.TLF.BASC.MA.ZS',
                         'SL.TLF.BASC.ZS', 'SL.TLF.ADVN.ZS', 'SL.TLF.ADVN.FE.ZS',
                         'SL.TLF.ADVN.MA.ZS', 'SL.TLF.INTM.ZS', 'SL.TLF.INTM.FE.ZS',
                         'SL.TLF.INTM.MA.ZS', 'SL.TLF.CACT.MA.ZS']
birth_rate_indicators = ['SP.DYN.CBRT.IN', 'SH.DYN.STLB']
unemployment_indicators = ['SL.UEM.1524.FM.ZS', 'SL.UEM.1524.FM.NE.ZS']
u_curve_indicators = ['NY.GDP.PCAP.KD', 'SL.TLF.CACT.FE.ZS']

# COMBINE ALL INDICATORS INTO ONE LIST
all_relevant_indicators = education_indicators + employment_indicators + birth_rate_indicators + unemployment_indicators + u_curve_indicators
  1. Retrieving and Pivoting Data: The retrieve_indicators function is used to filter the DataFrame for each set of indicators. The data is then pivoted to align 'Continent', 'Country Name', and 'Year' as indices and the indicators as columns. This restructuring transforms the data into a format that is more conducive for analysis, facilitating easy access to specific metrics across different geographical and temporal dimensions.

  2. Applying Log Transformation for Visualization: A logarithmic transformation is applied to the comprehensive DataFrame (all_indicators_df_log). This transformation is often used in data analysis to normalize data distribution or to handle wide-ranging values, making patterns more discernible and improving the interpretability of visualizations.

  3. Merging DataFrames for Correlation Analysis: The education and employment DataFrames are merged. This merging is pivotal for exploring the relationships between educational attainment and employment statistics. By analyzing these factors together, one can identify trends and correlations that might be obscured when examining them separately.

  4. Correlation Matrix Calculation: A correlation matrix is computed for the merged DataFrame. This matrix is a powerful tool in statistical analysis to quantify the degree and direction of the relationship between multiple variables. In this context, it helps to uncover insights into how different educational indicators might influence or relate to employment trends.

Each step of this code is deliberately designed to facilitate a comprehensive and insightful analysis of the dataset, aiming to reveal underlying patterns and relationships in the context of gender statistics. The methodical approach of categorizing, merging, and analyzing these indicators allows for a thorough exploration of the interdependencies between education, employment, and other socio-economic factors.

In [14]:
# RETRIEVE AND FORMAT DATA AS PIVOT
education_df = retrieve_indicators(education_indicators).reset_index().pivot(index=['Continent', 'Country Name', 'Year'], columns='Indicator Name', values='value')
employment_df = retrieve_indicators(employment_indicators).reset_index().pivot(index=['Continent', 'Country Name', 'Year'], columns='Indicator Name', values='value')

# Retrieve and pivot data for all relevant indicators, apply log transformation for visualization
all_indicators_df = retrieve_indicators(all_relevant_indicators).reset_index().pivot(index=['Continent', 'Country Name', 'Year'], columns='Indicator Name', values='value')
all_indicators_df_log = all_indicators_df.applymap(np.log2)

# Merge education and employment dataframes for correlation analysis
edu_empl_df = pd.merge(education_df, employment_df, on=['Continent', 'Country Name', 'Year'], how='inner')

# Perform correlation analysis
edu_empl_correlation_matrix = edu_empl_df.corr()
C:\Users\shami\anaconda3\Lib\site-packages\pandas\core\frame.py:9651: RuntimeWarning: divide by zero encountered in log2
  return lib.map_infer(x.astype(object)._values, func, ignore_na=ignore_na)

Understanding Gender Disparities in the Global Labor Market¶

In [15]:
# RELEVANT INDICATORS
selected_indicators = [
    'Wage and salaried workers, female (% of female employment) (modeled ILO estimate)',
    'Wage and salaried workers, male (% of male employment) (modeled ILO estimate)',]

filtered_data = gender_df[gender_df['Indicator Name'].isin(selected_indicators)]

# FILTER DATA
pivot_df = filtered_data.pivot_table(index=['Year', 'Continent'], columns='Indicator Name', values='value').reset_index()

# CALCULATE DIFFERENCE
pivot_df['Difference'] = -(pivot_df.iloc[:, 2] - pivot_df.iloc[:, 3])

# PLOT LINE CHART
plt.figure(figsize=(20, 8))
sns.lineplot(x='Year', y='Difference', hue='Continent', data=pivot_df, palette='Set2', marker='o')
plt.title('Difference Between Male and Female Employment with basic Education Over Time by Continent')
plt.ylabel('Percentage Difference')
plt.xlabel('Year')
plt.show()
In [16]:
# Analysis of the data to provide a detailed explanation
# Descriptive statistics for the 'Difference' column
difference_stats = pivot_df['Difference'].describe()

# Explore how the difference has changed over the years for each continent
# Calculate the average difference for each continent over the years
average_difference_by_continent = pivot_df.groupby('Continent')['Difference'].mean()

# Trend over the years for the global average difference
average_difference_by_year = pivot_df.groupby('Year')['Difference'].mean()

print(difference_stats)
count    186.000000
mean       0.665334
std        5.615572
min       -6.794273
25%       -4.197944
50%        0.390092
75%        1.670925
max       12.262240
Name: Difference, dtype: float64
In [17]:
average_difference_by_continent
Out[17]:
Continent
Africa           11.678033
Asia              1.438324
Europe           -5.589505
North America    -4.334177
Oceania           0.637593
South America     0.161737
Name: Difference, dtype: float64

Descriptive Statistics of the 'Difference' Variable:

  • The mean difference globally is approximately 0.67%, indicating a slight overall discrepancy between male and female wage and salaried workers.
  • The standard deviation is around 5.62%, suggesting notable variations across different data points.
  • The minimum and maximum values are -6.79% and 12.26%, respectively, indicating significant disparities in some cases.

Average Difference by Continent:

  • Africa shows the most significant average difference, favoring female employment with an average of around 11.68%. This might suggest a higher proportion of wage and salaried female workers compared to males in African countries.
  • Europe and North America exhibit negative averages (-5.59% and -4.33%, respectively), indicating a higher percentage of male wage and salaried workers.
  • Asia, Oceania, and South America display smaller differences, with Asia having a slightly higher average favoring male employment.
  • In some African societies, traditional gender roles have historically assigned specific occupations to men and women. Women may be expected to pursue careers in fields such as education, healthcare, or social services, which are often lower-paying compared to male-dominated professions.
  • North America has also experienced historical gender roles, but the impact of these roles has evolved over time. In the mid-20th century, there was a significant shift as women entered the workforce in larger numbers, challenging traditional norms.
  • The feminist movement and subsequent legal changes, such as anti-discrimination laws, have played a crucial role in challenging gender norms in North America. This has led to increased opportunities for women in various professions.
  • Asia is a vast and diverse continent with a wide range of cultural practices and societal norms. While some countries in East Asia, like Japan and South Korea, may face challenges related to traditional gender roles, other regions, such as Southeast Asia, may have different dynamics influenced by local cultures.
  • The below line chart depicting the differences between male and female wage and salaried workers across continents over time that serves as the cornerstone of our exploration.

  • The line chart presents a compelling visual narrative of the gender wage gap. It compares the percentages of wage and salaried workers between men and women across various continents, spanning multiple years. This visualization is more than a aggregation of statistics; it is a mirror reflecting the pervasive disparities in employment opportunities and remuneration between genders.

In [18]:
# Re-creating the plot for average difference by year with proper labels
plt.figure(figsize=(20, 8))
average_difference_by_year.plot(kind='line', marker='o', color='b')

# Adding title and labels for clarity
plt.title('Global Average Difference Between Male and Female Wage and Salaried Workers Over Time')
plt.ylabel('Average Percentage Difference (Female - Male)')
plt.xlabel('Year')

# Display the updated plot
plt.show()

Trend Over the Years:

  • The plotted trend of the average difference over the years showcases how this gender gap has evolved.
  • There are fluctuations, indicating that the gap has varied considerably over time.
  • Some years exhibit higher discrepancies, while others show a more balanced scenario between male and female wage and salaried workers.

Interpretation and Insights:

  • The variations in these differences across continents could be reflective of cultural, economic, and legislative factors influencing gender roles in the workforce.
  • The fluctuating trend over the years might correspond to global socio-economic changes, policy reforms, and shifts in societal attitudes towards gender equality in employment.
  • Regions like Africa showing a higher average difference in favor of females could be influenced by specific economic sectors that predominantly employ women or by proactive gender-inclusive employment policies.
  • Conversely, continents like Europe and North America, with negative averages, may have different economic structures, cultural norms, or historical contexts influencing these trends.

Conclusion:

  • This analysis highlights the complexities and regional variations in gender employment disparities. Understanding these dynamics is crucial for formulating effective policies and interventions aimed at achieving gender equality in the labor market.

Correlation between labor participation rates and birth rates¶

In [19]:
continent_list = list(all_indicators_df.index.get_level_values(0).unique())
color_list = ['red', 'yellow', 'blue', 'green', 'black', 'gray']

plt.figure(figsize=(13, 9))

for (a, b) in zip(continent_list, color_list):
    sns.regplot(x='Birth rate, crude (per 1,000 people)',
                y='Labor force participation rate, female (% of female population ages 15+) (modeled ILO estimate)',
                data=all_indicators_df_log[all_indicators_df_log.index.isin([a], level=0)],
                scatter=False, color=b, line_kws={'linewidth': 5}, ci=None, label=a)

plt.title('Correlation between Birth Rate and Female Labor Force Participation Rate')
plt.xlabel('Birth rate, crude (per 1,000 people)')
plt.ylabel('Labor force participation rate, female (% of female population ages 15+) (modeled ILO estimate)')
plt.grid(True)

plt.legend()

plt.show()
  • In our study, we plotted the correlation between labor participation rates and birth rates across various continents. The resulting chart presents a striking divergence in trends between Africa and Asia. In the African context, there is a notable positive correlation: as the birth rate increases, the labor participation rate also rises significantly. This trend suggests a robust engagement of women in the workforce, even with higher birth rates.

  • Conversely, the situation in Asia exhibits a starkly opposite pattern. Here, an increase in birth rates correlates with a significant decrease in women's labor participation. This trend may indicate societal or economic factors influencing women's ability to engage in the workforce as family size grows.

  • In contrast to these pronounced trends in Africa and Asia, other continents do not exhibit significant changes in labor participation rates relative to variations in birth rates. This observation implies that the relationship between birth rates and women's employment is less pronounced or is influenced by a different set of factors in these regions.

  • Overall, this analysis underscores the complex and region-specific nature of the relationship between birth rates and women's labor participation, highlighting the need for tailored policy approaches to address these varied dynamics.

Economic Development and Women at Work: A Snapshot Across Time¶

In [20]:
year_list = ['1991', '2001', '2011', '2021']

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=year_list,
    shared_xaxes=False,
    shared_yaxes=False,
    horizontal_spacing=0.1,
    vertical_spacing=0.2,
)

for i, year in enumerate(year_list, start=1):
    # Filtering the DataFrame for the current year
    all_indicators_df_log_year = all_indicators_df_log[
        all_indicators_df_log.index.isin([year], level=2)
    ].reset_index().copy()

    scatter = px.scatter(
        data_frame=all_indicators_df_log_year,
        x='GDP per capita (constant 2010 US$)',
        y='Labor force participation rate, female (% of female population ages 15+) (modeled ILO estimate)',
        trendline="lowess",
        trendline_options=dict(frac=0.4),
    )

    scatter.update_traces(
        marker=dict(size=8, color='orange'),
        line=dict(color='blue', width=2),
        selector=dict(mode='markers')
    )

    trendline_data = scatter['data'][1]
    fig.add_trace(scatter['data'][0], row=(i + 1) // 2, col=(i % 2) + 1)
    fig.add_trace(go.Scatter(x=trendline_data['x'], y=trendline_data['y'],
                             mode='lines', line=dict(color='blue', width=2)),
                  row=(i + 1) // 2, col=(i % 2) + 1)

fig.update_layout(
    title_text="GDP vs Female Labor Force Participation (2020)",
    showlegend=False,
    width=800,
    height=800,
)
fig.show()

Connecting the U-shaped curve observed in our scatter plot visualization to the once mentioned in the research paper:

  • The observed U-shaped curve in our scatter plot analysis, which aligns with the historical trends documented in the research paper, offers profound insights into the evolving dynamics of women's labor force participation globally. This U-curve pattern reveals a multifaceted narrative of women's engagement in the labor market across varying stages of economic development.

  • In less developed economies, the high participation of women in the labor force, as noted in the left arm of the U-curve, is predominantly driven by economic necessity. Women, in these contexts, are compelled to contribute to the workforce to support their families and communities, often in lower-paying and less skilled jobs.

  • As economies progress and educational and professional opportunities enhance, an initial decline in women’s labor force participation is observed, denoted by the curve's dip. This phenomenon can be attributed to various factors, including increased household income alleviating the immediate need for dual-income families and the transitional phase of women pursuing higher education and professional training.

  • However, the upward trajectory of the U-curve in more developed economies, consistent with historical data on high-income countries, signifies a resurgence in women's labor force participation. This increase is reflective not just of economic development, but also of a cultural shift towards recognizing and valuing women's roles in professional and skilled sectors. It suggests an environment where women are increasingly able to leverage educational advancements and societal changes to access better employment opportunities.

  • The parallel observations from our scatter plot and the research paper elucidate that while women's labor force participation has indeed increased over time, especially in developed economies, this does not directly translate to gender equality. Persistent disparities in earnings, job positions, and representation in leadership roles continue to challenge the narrative of equality within the workforce.

  • In conclusion, the U-curve phenomenon encapsulates a complex interplay of economic, cultural, and social factors influencing women's labor market participation. It serves as a reminder that while progress has been made, comprehensive strategies addressing both economic development and cultural norms are essential to truly achieving gender parity in the labor force.

Women's Empowerment Journey Across the Globe¶

We explored how women's empowerment differs around the world, so we dived into the "Women, Business, and the Law" dataset from 1971 to 2023. Our focus was on six important indicators like job opportunities, legal protection, and more, each shedding light on gender equality.

With the help of Plotly, a friendly tool, the code created a map showing Women, Business, and the Law (WBL) Index rates for different countries in 2021. Darker colors meant higher equality, while lighter ones suggested areas that could use more attention.

Findings from the Indicators¶

With the help of the Plotly, we found out that some countries were doing great in gender equality, while others needed a bit more help.

Job Opportunities: Can a woman get a job in the same way as a man?¶

Legal Protection: Does the law stand against discrimination in employment based on gender? Night Work: Can a woman work at night just like a man? Risk and Danger: Can a woman work in a job considered dangerous, just like a man?

Industrial Jobs: Can a woman take on industrial jobs similarly to a man?¶

Business Ownership: Can a woman register a business in the same way as a man? The Magic of 2021 Our code chose the magical year of 2021 to cast its spell. It filtered the data to focus solely on this year, uncovering insights that would shape our understanding of gender equality.

In [21]:
file_path = r'WBL-1971-2023-Dataset.xlsx'

df_wbl_data = pd.read_excel(file_path)

# Filter the data for the year 2021
data_2021 = df_wbl_data[df_wbl_data['Report Year'] == 2021]

# Create a choropleth map using plotly
fig = go.Figure(data=go.Choropleth(
    locations=data_2021['ISO Code'],
    z=data_2021['WBL INDEX'],
    text=data_2021['Economy'],
    colorscale='Viridis',  # You can choose a different colorscale
    colorbar=dict(title='WBL Index'),
))

fig.update_layout(
    title='WBL Index Rates for Different Countries in 2021',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='natural earth'
    )
)

fig.show()

Belgium has highest WBL index while Sudan has lowest

  • Belgium may have comprehensive non-discrimination laws that explicitly prohibit discrimination based on gender in various aspects of life, including employment.

  • Proactive government initiatives and programs aimed at promoting gender equality and women's empowerment can significantly contribute to a higher WBL Index.

  • Progressive policies within companies, including gender diversity on boards and leadership positions, can contribute to a higher WBL Index.

  • Sudan may have limited or inadequate legal frameworks that address gender-based discrimination, harassment, and equal opportunities for women in various aspects of life, including the workplace.

  • A significant gender wage gap, where women earn less than men for similar work, can contribute to lower WBL Index scores. Occupational Segregation: Concentration of women in lower-paying and less prestigious occupations compared to men can be a factor.

  • International Standing: Limited involvement in international agreements and commitments related to gender equality may contribute to a lower global ranking.

Conclusion¶

Measure for Future Improvements:

  1. Policy Reforms: Implement and strengthen policies that address gender disparities in the workplace, ensuring equal opportunities and fair remuneration.
  2. Education and Training: Invest in education and training programs to equip women with skills relevant to emerging industries and promote STEM education.
  3. Family-Friendly Policies: Enforce family-friendly policies, such as flexible work schedules and parental leave, to support work-life balance.
  4. Promote Entrepreneurship: Facilitate programs supporting female entrepreneurship, including mentorship and access to funding.
  5. Enhance Workplace Culture: Foster inclusive workplace cultures, eliminate biases in hiring and promotions, and encourage mentorship programs.

The analysis underscores the need for targeted and comprehensive strategies to address gender disparities in the global labor market. From understanding wage gaps to recognizing the nuanced relationship between birth rates and female labor force participation, it is evident that no one-size-fits-all solution exists. Future improvements require collaborative efforts, including policy reforms, educational initiatives, and cultural shifts, to pave the way for a more equitable and inclusive global workforce. Emphasizing women's empowerment and dismantling barriers will not only narrow existing gaps but also contribute to sustainable economic development and social progress on a global scale.

References¶

  • Gender Data Portal From the World Bank: https://genderdata.worldbank.org/
  • Nobel Prize Paper: https://www.nobelprize.org/prizes/economic-sciences/2023/press-release/ https://www.nobelprize.org/uploads/2023/10/advanced-economicsciencesprize2023.pdf
  • Study of Differences between sexual orientation - https://williamsinstitute.law.ucla.edu/wp-content/uploads/National-LGBT-Poverty-Oct-2019.pdf
  • Wage Gap Among LGBTQ+ Workers in the United States: https://www.hrc.org/resources/the-wage-gap-among-lgbtq-workers-in-the-united-states